## Benchmark

This folder contains scripts and policies for benchmarking model inference and training, with a focus on distributed and parallel settings.  
Key components include:

- **Inference/**: Scripts for evaluating inference speed and communication overhead (e.g., Allreduce_FAL.py, Allreduce_GPT2.py).
- **Train/**: Scripts for benchmarking training throughput and efficiency, including learning rate and quantization tools.
- **Policy/**: Custom policies for distributed execution.

Use these scripts to measure and compare the efficiency of FAL, FAL+, and baseline transformer models under various parallelism strategies.

## File Descriptions

| Folder/File                              | Description                                                                                     |
|------------------------------------------|-------------------------------------------------------------------------------------------------|
| `Inference/Allreduce_FAL.py`             | Script for evaluating inference speed with FAL using All-reduce communication.                  |
| `Inference/Allreduce_GPT2.py`            | Script for evaluating inference speed with GPT-2 using All-reduce communication.                |
| `Inference/Policy/FAL_forwards.py`       | Implements forward pass logic for FAL models.                                                  |
| `Inference/Policy/FAL_policy.py`         | Defines distributed execution policies for FAL models.                                         |
| `Train/Allreduce_FAL.py`                 | Script for benchmarking training throughput with FAL using All-reduce communication.            |
| `Train/Allreduce_GPT2.py`                | Script for benchmarking training throughput with GPT-2 using All-reduce communication.          |
| `Train/LR.py`                            | Script for evaluating learning rate schedules during training.                                 |
| `Train/QT.py`                            | Script for benchmarking quantization techniques during training.                               |
| `Train/Policy/LR_policy.py`              | Implements learning rate policies for distributed training.                                    |
| `Train/Policy/QT_policy.py`              | Implements quantization policies for distributed training.                                     |
| `Train/Policy/Reduce_backward_LR.py`     | Implements backward pass reduction logic for learning rate policies.                          |
| `Train/Policy/Reduce_backward_QT.py`     | Implements backward pass reduction logic for quantization policies.                           |

## How to Run

Below are examples of how to run the scripts:

### Inference Benchmark
To evaluate inference speed with FAL:
```bash
colossalai run --nproc_per_node ? Inference/Allreduce_FAL.py
```

To evaluate inference speed with GPT-2:
```bash
colossalai run --nproc_per_node ? Inference/Allreduce_GPT2.py
```

To benchmark training throughput with FAL:
```bash
colossalai run --nproc_per_node ? Train/Allreduce_FAL.py
```

To benchmark training throughput with GPT-2:
```bash
colossalai run --nproc_per_node ? Train/Allreduce_GPT2.py
```

Ensure you have the required dependencies installed, such as torch, transformers, and any distributed training libraries (e.g., torch.distributed). A multi-GPU setup is recommended for distributed benchmarks.



